Reserve the first level headings (#) for the start of a new Module. This will help to organize your portfolio in an intuitive fashion.
Note: Please edit this template to your heart’s content. This is meant to be the armature upon which you build your individual portfolio. You do not need to keep this instructive text in your final portfolio, although you do need to keep module and assignment names so we can identify what is what.
The remaining second level headers (##) are for separating data science Friday, regular course, and project content. In this module, you will only need to include data science Friday and regular course content; projects will come later in the course.
Third level headers (###) should be used for links to assignments, evidence worksheets, problem sets, and readings, as seen here.
Use this space to include your installation screenshots.
Detail the code you used to create, initialize, and push your portfolio repo to GitHub. This will be helpful as you will need to repeat many of these steps to update your porfolio throughout the course.
git add
and use git status to check
Paste your code from the in-class activity of recreating the example html.
version January 18, 2018
The following assignment is an exercise for the reproduction of this .html document using the RStudio and RMarkdown tools we’ve shown you in class. Hopefully by the end of this, you won’t feel at all the way this poor PhD student does. We’re here to help, and when it comes to R, the internet is a really valuable resource. This open-source program has all kinds of tutorials online.
http://phdcomics.com/ Comic posted 1-17-2018
The goal of this R Markdown html challenge is to give you an opportunity to play with a bunch of different RMarkdown formatting. Consider it a chance to flex your RMarkdown muscles. Your goal is to write your own RMarkdown that rebuilds this html document as close to the original as possible. So, yes, this means you get to copy my irreverant tone exactly in your own Markdowns. It’s a little window into my psyche. Enjoy =)
hint: go to the PhD Comics website to see if you can find the image above If you can’t find that exact image, just find a comparable image from the PhD Comics website and include it in your markdown
Let’s be honest, this header is a little arbitrary. But show me that you can reproduce headers with different levels please. This is a level 3 header, for your reference (you can most easily tell this from the table of contents).
Perhaps you’re already really confused by the whole markdown thing. Maybe you’re so confused that you’ve forgotton how to add. Never fear! A calculator R is here:
1231521+12341556280987
## [1] 1.234156e+13
Or maybe, after you’ve added those numbers, you feel like it’s about time for a table! I’m going to leave all the guts of the coding here so you can see how libraries (R packages) are loaded into R (more on that later). It’s not terribly pretty, but it hints at how R works and how you will use it in the future. The summary function used below is a nice data exploration function that you may use in the future.
library(knitr)
kable(summary(cars),caption="I made this table with kable in the knitr package library")
| speed | dist | |
|---|---|---|
| Min. : 4.0 | Min. : 2.00 | |
| 1st Qu.:12.0 | 1st Qu.: 26.00 | |
| Median :15.0 | Median : 36.00 | |
| Mean :15.4 | Mean : 42.98 | |
| 3rd Qu.:19.0 | 3rd Qu.: 56.00 | |
| Max. :25.0 | Max. :120.00 |
And now you’ve almost finished your first RMarkdown! Feeling excited? We are! In fact, we’re so excited that maybe we need a big finale eh? Here’s ours! Include a fun gif of your choice!
)
The template for the first Evidence Worksheet has been included here. The first thing for any assignment should link(s) to any relevant literature (which should be included as full citations in a module references section below).
You can copy-paste in the answers you recorded when working through the evidence worksheet into this portfolio template.
As you include Evidence worksheets and Problem sets in the future, ensure that you delineate Questions/Learning Objectives/etc. by using headers that are 4th level and greater. This will still create header markings when you render (knit) the document, but will exclude these levels from the Table of Contents. That’s a good thing. You don’t’ want to clutter the Table of Contents too much.
Describe the numerical abundance of microbial life in relation to ecology and biogeochemistry of Earth systems.
How do we estimate the prokaryotic population of the world? And what is it made up of?
What are the uncertainties that come with this measurement?
Which environments contain the most prokaryotic biomass?
How does this biomass affect global nutrient cycles? (e.g. P, C, N)
Prokaryotic estimates were based upon average data from the following four environments: aquatic environments, soil, subsurface, and “other habitats” including in or on animal or plant surfaces or in the air. They used experimentally derived values to perform these calculations, but interestingly, not the same value sets for each environment. For example, some calculations included cell volume, while others included just the area of the environment. Vi
Also, they compared their calculated values with some from other papers, which resulted in some differences that they attempted to explain.
They found that prokaryotes contain about half of the organic carbon on earth, and 90% of the nutrients (compared to plants) In brief, the prokaryotic biomass and thus their contribution to global cycles is very large - doubling estimates of the amount of carbon stored in living organisms globally. They broke down the calculations into four environments: aquatic environments, soil, subsurface, and “other habitats”.
Aquatic environments- this includes the open ocean, sediment in the ocean, freshwater and saline lakes (3 orders of magnitude less) and polar regions. Prokaryotes are ubiquitous in these environments - 1180 x 1026 cells.
Soil- surprisingly, there are less prokaryotes in forest soils than in other soils. The estimates varied by ecosystem. 255.6 x 1027 cells.
Subsurface- e.g. terrestrial habitats below 8 m and marine sediments below 10 cm. (this includes groundwater too) This environment is difficult to estimate because it is difficult to obtain uncontaminated samples. However, it has been suggested to be enormous. 3.8 x 1030 cells.
Other environments - discussed the prokaryotes that live on animals, insects, and plants, and also those in the air/atmosphere. 53.024 x 1023 cells (several orders of magnitude smaller)
These large numbers mean that not only carbon, but N and P are stored in globally significant amounts in prokaryotes.
Disproves Kluyver’s estimate that 1/2 of the living protoplasm on earth is microbial - likely this number is far too conservative. The paper also discusses growth rates to estimate cell turnover, and fluxes in and out of these environments.
In subsurface environments, the turnover time of cells seems exceedingly large, is this a good estimate?
Where does the energy in the subsurface environments come from? Photosynthesis? Chemolithotrophy?
From the passage “in the polar regions, a relatively dense community of algae and prokaryotes forms at the water-ice interface” - why does this occur?
How accurate can these calculations be if they are based upon just a few estimates?
How much flux occurs among all of these prokaryotic environments? Especially the subsurface environment, if so many cells are hypothesized to be metabolically inactive, how much flux can occur? Is it more of a pool than a flux?
Whitman et al 1998 Kasting JF,and Siefert JL. 2003.
Describe the numerical abundance of microbial life in relation to the ecology and biogeochemistry of Earth systems.
The primary prokaryotic habitats on earth are split into aquatic habitats, soil, and subsurface habitats. According to table 5 of the text, there are 12 x 1028 cells in aquataic habitats, 26 x 1028 cells in the soil, and interestingly, 355 x 1028 and 25-250 x 1028 prokaryotic cells in the oceanic and terrestrial subsurface respectively. However, in order to rank these habitats based on their capacity to support life, we must come up with a universal definition for “capacity to support life”. If you were to define this as the total number of prokaryotic cells in a given habitat, it would appear that the oceanic subsurface habitat has the greatest capacity to support life. However, this does not take into account whether these cells are metabolically active, or their turnover time, or the total area occupied by the habitat.
2.8 x 1028 cells in the upper 200m
The average density is 5 x 105 cells/mL
To calculate what fraction of this ratio are cyanobacteria:
4 x104 cells/ml / 5 x 105 cells/ml x 100 = 8%
This ratio is significant because these cells are autotrophs, which means that they are responsible for asimilating inorganic carbon into this environment, and thus are an important aspect of carbon cycling in the ocean. This is not only important for aquatic environments, but for the atmospheric composition of the earth as well. This is because some organic carbon fixed by these autotrophs are not respired and stored in marine sediment. Since respiring this material generally requires oxygen, its long-tem storage means that oxygen can remain in significant levels in the earth’s atmosphere. This is in contrast to terrestrial systems, where carbon fixed by autotrophs is generally respired.
Autotroph - produces organic complex carbons from simple inorganic substances such as carbon dioxide. Heterotroph - takes up organic carbon to produce energy and synthesize compounds Lithotroph - uses an inorganic substrate to obtain reducing equivalents for use in biosynthesis or energy conservation via aerobic or anaerobic respiration
Temperature is the limiting factor in subsurface environments, and at around 4 km in terestrial environments the temperature reaches 125 degrees celsius. This is the generally agreed upon temperature limit of prokaryotic life.
The deepest habitat capable of supporting prokaryotic life is the Mariana Trench, which is 10.9 km deep, then cellular life should be able to persist another 4 km deeper in the subsurface- so 14.9 km total.
Mount everest is 8.8 km high, so that would be the highest terestrial habitat capable of supporting prokaryotic life.
Additionally, in the text, bacteria in the air were discussed. However, are these bacteria actually metabolically active? They could just be spore formers, or metabolically inactive until they reach an environment that is more viable.
The paper stated 77 km, but this does not seem very realistic, because the limiting factors in these environments include nutrient availability, UV radiation, and temperature. So I would say more like 20 km high.
I would say that the vertical distance of the Earth’s biosphere is a range of 24 km - 34 km (due to the Mariana trench)
Population size x turnovers/year = cells/year
Marine heterotrophs: 3.6 x 1028 cells x 365 days / 16 days/turnover = 8.2 x 1029 cells/year
Assuming the carbon efficiency is 20% If there is around 5-20 fg of carbon in a prokaryotic cell, 20 fg C/cell = 20-30 Pg/cell
3.6 x 1028 cells x 20-30 Pg/cell = 0.72 Pg C are trapped in marine heterotrophs To calculate the total carbon flux, we should multiply this value by 5, but the authors used 4, so 4 x .72 = 2.88 Pg/year
51 Pg C/year, 85% of the carbon in the photic zone is consumed = 43 Pg C
43 Pg C/year / 2.88 Pg/year = 14.4 turnovers/year 1 turnover every 25.4 days
This varies with depth in the ocean due to access to sunlight, as photosynthesis provides the energy necessary for carbon fixation. In terestrial habitats, a number of factors including differences in depth, sediment, nutrient availability, and cell density contribute to the differences in carbon fixation and turnover rate.
Given the large population size and high mutation rate of prokaryotic cells, this indicates that prokaryotic cells have a very large adaptive potential. As prokaryotes have existed on the earth for billions of years, this corresponds to an incredibly large amount of genetic diversity.
However, it is foolhardy to assume that point mutations are the only way that microbial genomes diversify and adapt. Infection by bacteriophages can transport foreign DNA into a cell, some cells can share plasmids using conjugative pili, and others can uptake exogenous DNA from their environment. Additionally, other mutation events can occur independent of point mutations in a single cell, such as gene duplication or deletion.
Based on the information provided in the text, I would say that prokaryotic abundance and metabolic potential is highly related to the diversity of organisms present in the biosphere. Given the large abundance of prokaryotic life, and its correspondingly large mutation rate, over the course of geologic time this has generated an incredibly diverse set of metabolic capabilities of individual prokaryotes. These different metabolic abilities have enabled prokaryotes to colonize the entire biosphere (24-34 km of the earth’s surface, subsurface, and lower atmosphere).
Comment on the emergence of microbial life and the evolution of Earth systems
Indicate the key events in the evolution of Earth systems at each approximate moment in the time series. If times need to be adjusted or added to the timeline to fully account for the development of Earth systems, please do so.
According to Nisbet et al, our solar system began after one or multiple supernova explosions. At this time, the inner planets (including earth) were formed from collisions between “planetesimals” - debris about the size of the earth’s moon.
+ 4.1 billion years ago
The suggested origin of life (microbial) according to Nisbet et. al. Due to the intrinsically difficult nature of establishing an exact date, the origin of life is instead given in a range: 4.0+/-0.2 Gyr. This data is supported by specific carbon isotope signatures in zircons.
+ 3.8 billion years ago
According to Nisbet et al, the earth suffered “frequent massive meteorite impacts”, some of which were large enough to cause the liquid water in the oceans to become steam.
This is when the earliest sedimentary rocks were found, which indicates that there were liquid oceans.
+ 3.5 billion years ago
Fossil evidence of microbial biofilms and stromatolites first appears. Additionally, increased isotopic evidence for life. This is when Rubisco, the enzyme necessary for oxygenic photosynthesis was thought to have developed. This is when LUCA - the Last Universal Common Ancestor of all life was thought to have lived.
+ 3.0 billion years ago
First global glaciation event due to the presence of increasing amounts of oxygen in the earth’s atmosphere (oxygenic photosynthesis) which reacted with methane in the atmosphere. During this time the sun was much weaker, so the earth would have frozen earlier had it not been for methanogenesis contributing greenhouse gasses to keep the earth warm.
+ 2.7 billion years ago
Increasing rise of atmospheric O2, believed to be due to the rise of cyanobacteria. This is also when one of the first glaciation events occured, due to the decrease of CH4 in the atmosphere due to microbial processes.
+ 2.2 billion years ago
There are some findings that hypothetically date life on land beginning as early as 2.2 billion years ago. However, microbial fossils are far from conclusive.
+ 2.1 billion years ago
The advent of the first complex (read, multicellular) organisms, or at least fossil evidence for them.
+ 1.3 billion years ago
Evidence of the first land fungi and microbes: not photosynthetic land plants. Photosynthetic land plands were only thought to have evolved around 400 million years ago.
+ 550,000,000 years ago
The Cambrian explosion, when most modern day animal phyla are thought to have evolved. This was a major diversification of complex life on the planet.
+ 200,000 years ago
The first record of Homo Sapiens (us) in Africa.
Describe the dominant physical and chemical characteristics of Earth systems at the following waypoints:
The Hadean is generally established as 4.6 - 4 Gyr, and is contains the origin of earth as a planet to the origin of life on earth. During this time, meteorite bombardment levels were very high, and conditions on the surface were not well suited to life today. Initially, the earth was molten until after 4.5 billion years ago when the moon was formed. Nisbet et. al compares the Hadean earth to a “Norse Ice-Hades” with glacial temperatures interspersed with very high temperatures as a result of meteorite impacts. The oceans formed during this time period. Additionally, the early sun was fainter.
+ Archean
Life developed during the Archaean (4-2.5 Gyr). Volcanic activity was very high, and due to the advent of oxygenic photosynthesis, the beginning of the oxygenation of the earth’s atmosphere occured.
+ Precambrian
The Precambrian Supereon stretches from 4.6 Gyr - 0.56 Gyr, and contains the Hadean, Archean, and Proterozoic (but not the Phanerozoic) eons. The Earth went through a wide variety of physical and chemical characteristics during this time (see other sections for details).
+ Proterozoic
2.5-0.56 Gyr. Oxygenation of the earth’s atmosphere continued, finally reaching significant levels due to the proliferation of oxygenic photosynthesis. This established conditions necesary for the first complex and multicellular organisms. As the sun’s luminosity increases by 6% every billion years, the earth begins to recieve more heat from the sun. However, there is evidence that the earth cooled during this period, a hypothesis known as Snowball Earth due to changes in the cheical composition of the atmosphere. There were likely repeated cycles of glaciation.
+ Phanerozoic
0.56 Gyr to present day. This eon encompasses the development of land plants and land life, as well as the origins of most of the recognized animal phyla to this day. This comparatively small (considering the extent of the precambrian superepoch) epoch also contains several planetary extinction events establishment of land masses as we know them today.
Discuss the role of microbial diversity and formation of coupled metabolism in driving global biogeochemical cycles.
According to Falkowski et al, the primary geophysical and biochemical processes that create and sustain conditions for life on Earth are plate tectonics and atmospherical photochemical processes. These phenomenon “supply substrates and remove products” necessary for avoiding planetary thermodynamic equilibrium at which point substrates essential for life on earth would be depleted. Abiotic and biotic processes are related in that together they establish the “average redox state” of the planet. The difference between abiotic and biotic processes is partially that of time scale, due to biological enzymes that can catalyze reactions, biotic processes can occur at greater speed, even if biotic and abiotic reactions are equally thermodynamically favorable. Additionally, biotic processes can drive oxidation based on photosynthesis, a unique energy transduction process.
According to a reasearchgate post, “An emergent property is a property which a collection or complex system has, but which the individual members do not have.” Thus, since earth’s redox state is a product of biotic and abiotic processes, e.g. feedback between microbial metabolism and geochemical events - for context think about the “snowball earth” phenomenon after the advent of oxygenic photosynthesis, this is an emergent property. Individual populations of microbes do not establish the earth’s redox state, but collectively their interactions with geochemical processes do.
To describe this process, I will use the example outlined in the Falkowski paper, that of the global nitrogen cycle, which before human intervention was run excusively by microbes. In this case, the different reversible (except for N2 to NH4 which only biologically occurs in one direction) reactions are mediated by different kinds of bacteria. These bacteria can be spatially separated, and use different forms of nitrogen as different kinds of substrates (as terminal electron acceptors, or as an electron donor in the case of nitrifying bacteria). The thermodynamic favorability of each reaction is also influenced by the availablility of other substrates (oxygen, organic matter) to help overcome thermodynamic barriers to reversible electron flow.
The Falkowsi paper describes how the nitrogen cycle partitions between different redox niches and microbial groups. The paper states that “[t]ypically, reduction and oxidation reactions are segregated in different organisms”, and this is especially true in the nitrogen cycle. Some bacteria fix nitrogen, i.e. convert N2 gas into NH4. Other nitrifiers (archaea) oxidize ammonia to NO2-, and still others convert NO2- to NO3-. Finally, yet other bacteria reverse the cycle use NO2- and NO3- as terminal electron acceptors in the absence of nitrogen, thus re-forming N2. The Canfield paper describes the relationship between the nitrogen cycle and climate change. Specifically, N2O, a part of the nitrogen cycle, is a potent greenhouse gas.
Metabolic diversity is not as large as, say, diversity in “boutique” or nonessential genes specific to particular environments. This is because metabolic genes are essential, and make up components of “multimeric microbial machines”. Thus, they are more evolutionarily constrained than other kinds of genes due to their required interaction with other genes in the essential processes of energy transduction, DNA replication, et. cetera. Thus, even genes encoding imperfect proteins and enzymes (the Falkowski paper uses the D1 protein of photosystem II as an example) remain evolutionarily conserved. However, as other “nonessential” genes are not constrained in this manner, the discovery of new protein families is directly correlated with the sheer volume of sampling performed. It is these genes that show the extent of the microbial diverersity that has evolved over the last 4 billion years or so.
I choose the paper “A safe operating space for humanity” by Rockstrom et al
What are the “planetary boundaries” that define environmental change in the Anthropocene? How do we identify and quantify these boundaries? How much “wiggle room” does humanity have to surpass these boundaries? How do these biogeochemical processes which mark the planetary boundaries effect one another?
This particular paper was really more of a review, or a compilation of previously published or estimated data incorporated into a new context. Therefore, there wasn’t really a methods section. They were very careful to explain the limitations of drawing larger conclusions from these larger scale data sets, and the limitations in collecting data concerning specific planetary boundaries. For example, in table 1, atmospheric aerosol loading and chemical polution both had no data sets to display.
They identified nine planetary boundaries: “climate change; rate of biodiversity loss (terrestrial and marine); interference with the nitrogen and phosphorus cycles; stratospheric ozone depletion; ocean acidification; global fresh- water use; change in land use; chemical pollution; and atmospheric aerosol loading” The paper presents evidence that three of these boundaries have been overstepped.
To what degree do environmental perturbations affect each other? The paper mentions “long term reinforcing feedback processes” such as decreases in vegetation cover that contribute to further climate change. The paper states that “If one boundary is transgressed, then other boundaries are also under serious risk.”" How would one quantify chemical pollution and atmospheric aerosol loading in a global setting? What is the timeline for the effects of transgressing environmental boundaries to become visible? Does it depend on the specific boundary that was crossed?
Victoria Panwala
Student Number: 14028147
Module 1 Writing Assignment
Prompt: “Microbial life can easily live without us; we, however, cannot survive without the global catalysis and environmental transformations it provides.” Do you agree or disagree with this statement? Answer the question using specific reference to your reading, discussions and content from evidence worksheets and problem sets.
Microbial life exists all around us. Microbes persist up to 4 kilometers below our feet, and 20 kilometers above our heads(1). Indeed, it is difficult to find an environment on earth that has not been colonized in some way by microbial life. Thermophilic bacteria can survive in temperatures up to 122 C(2), and chemoautotrophs have even evolved to obtain energy from the oxidation of substances like hydrogen sulfide(3). Not only have microbes been involved in making the earth habitable for multicellular life, they are also responsible for creating the environmental conditions essential for life today. Processes such as the nitrogen cycle, the carbon cycle, the production of vitamins, and the decomposition of recalcitrant materials could not proceed at the same rates without input from microbes. Thus, if not for microbial life, humans would not, and could not, exist.
While the exact time frame of life’s appearance on our planet is hotly debated – geological evidence points to somewhere between 4 and 3.5 billion years ago – its microbial nature is not(4). Microbes existed for billions of years before complex multicellular life arose, and transformed the environment of the early earth. Before the advent of oxygenic photosynthesis carried out by cyanobacteria, the Archaean Earth was an anoxic place. Methanogens, likely archaea, created an atmosphere which contained much more methane than the one that exists today. The greenhouse effects of this methane exerted a warming effect on the early earth, compensating for a sun that was considerably dimmer(5). Once oxygenic photosynthesis became a prominent contributor to the oxygen content of the earth’s atmosphere, this altered environmental conditions considerably. For example, this phenomenon allowed for the formation of the ozone layer that we rely on for protection today(5). In fact, even now that multicellular eukaryotic plants are also capable of performing photosynthesis, heterotrophic respiration requiring oxygen negates most of the oxygenic input from terrestrial systems. It is only in marine environments, where photosynthesis is performed primarily by microbes, that a net source of oxygen is released to the earth’s atmosphere. This is because a small amount of organic matter synthesized by these microbes is “buried” in marine sediments, away from oxygen consuming heterotrophs(5). Thus, microbes in general, and oxygenic cyanobacteria in particular, are responsible for creating the conditions required for life as we know it.
Although nitrogen gas makes up about 78% of our atmosphere, biologically available nitrogen sources are in somewhat shorter supply. In order to be utilized by organisms (including humans), nitrogen gas must first be fixed from its inert N2 gas form. The processes of nitrification and denitrification- which closes the nitrogen cycle by once again creating inert N2 gas- are both catalyzed by microbes(6). Indeed, eukarya do not possess the nitrogenase gene required for nitrogen fixation, it is only present in the microbial domains of archaea and bacteria(7). Even in the well-studied cases of nitrogen fixing bacterial symbionts living in the root nodules of legumes, horizontal gene transfer has not distributed this particular metabolic pathway between the two organisms(7). Forced to obtain ever greater amounts of nitrogen for food-production purposes, humankind has resorted to the industrial Haber Bosch process to produce ammonia-containing fertilizer. Nowadays, fossil fuel use and the Haber Bosch process account for 45% of the annual nitrogen fixation on earth(6). This is not an insignificant nitrogen fixation flux, but it is still folly to assume that humans could control the global nitrogen cycle without input from microbial sources. Aquatic environments still rely on microbial nitrogen-fixers to provide usable forms of nitrogen for primary production. Due to the importance of this environment as a carbon sink(8), the disappearance of this microbial pathway would have a large effect on oceanic production, and the global carbon cycle. Therefore, even though we are capable of fixing our own nitrogen, the regulation of the nitrogen cycle by prokaryotes is essential for our survival as a species.
The carbon cycle is yet another process that is fundamentally controlled by microbial processes. Microbial primary producers (such as cyanobacteria) and heterotrophic microbes serve as sinks for atmospheric CO2 and the sources of CO2 fluxes into the environment respectively(11). Even though increased anthropogenic CO2 fluxes from fossil fuels are becoming ever more of a concern, microbes in the soil contribute 10 times more of a CO2 flux globally(11). This decomposition and respiration of organic matter allows for the transformation of this organic material into forms that are once-again usable by other life-forms. This is especially true for complex organic polymers such as lignin, hemicellulose, and cellulose. Microbes produce and secrete extracellular enzymes that break down these complex structures so their components may be metabolized(11). Without this activity, it is undoubtable that organic material would not turnover at the rate that we are accustomed to today. This would have an enormous impact on essential human processes such as crop production.
Humans and other organisms require a variety of essential micronutrients to perform basic metabolic processes. Often, these micronutrients, also known as vitamins, are precursors or cofactors of metabolic enzymes. While many microbes can synthesize these on their own, humans must obtain most vitamins exogenously(9,10). Some of these vitamins are obtained from the food that we eat, and absorbed by the human digestive tract. Still others are synthesized by our very own microbiota that persists in the gastrointestinal tract. Metagenomic studies of human gut commensals have shown that collectively, they possess many of the biosynthetic pathways required for vitamin synthesis. B vitamins in particular (folate, riboflavin, B12, niacin, pyridoxine, et cetera) are often synthesized by lactobacilli living in the human gut(10). In a world without microbes, these essential biosynthetic processes would be halted, and humans and other organisms would become rapidly vitamin-deficient.
The truth of the matter is that we humans cannot survive without the global biogeochemical processes performed by microbes. Although individually tiny, their contributions on a global scale shape the world around us today. However, the question remains, how long can this global microbial metabolism continue to sustain us as a species? Human input into biogeochemical cycles has fundamentally altered the nitrogen and carbon cycles of our planet. How much longer can microbes continue to maintain earth’s systems as we know them today? Thanks to their rapid evolutionary capacity and astounding metabolic and environmental diversity, microbes will certainly survive all but the most catastrophic of environmental alterations. But due to our much more stringent environmental requirements for survival, humanity will not.
References:
Whitman WB, Coleman DC, and Wiebe WJ. 1998. Prokaryotes: The unseen majority. Proc Natl Acad Sci USA. 95(12):6578-6583. PMC33863
Takai T; et al. (2008). “Cell proliferation at 122°C and isotopically heavy CH4 production by a hyperthermophilic methanogen under high-pressure cultivation” (PDF). PNAS. 105 (31): 10949-51. Bibcode:2008PNAS..10510949T. doi:10.1073/pnas.0712334105. PMC 2490668
Nakagawa S, Takai K. 2008. Deep-sea vent chemoautotrophs: diversity, biochemistry, and ecological significance. FEMS Microbiology. Jul;65(1):1-14. doi: 10.1111/j.1574-6941.2008.00502 PMID:18503548
Nisbet EG, and Sleep NH. 2001. The habitat and nature of early life. Nature. 409: 1083-1091.
Kasting JF,and Siefert JL. 2003. Life and the Evolution of Earth’s Atmosphere. Science. 296: 1066-1067. (https://www.ncbi.nlm.nih.gov/pubmed?term=Science%5BJour%5D+AND+Life+and+the+Evolution+of+Earth's+Atmosphere&TransSchema=title&cmd=detailssearch)
Canfield DE, et al. 2010. The Evolution and Future of Earth’s Nitrogen Cycle. Science 330, 192 (2010);DOI: 10.1126/science.1186120
Falkowski P, et al. 2000. “The Global Carbon Cycle: A test of our knowledge of earth as a system. Science’s Compass: Review. April 3, 2013.
Burkholder, P. R., & McVeigh, I. (1942). Synthesis of Vitamins by Intestinal Bacteria. Proceedings of the National Academy of Sciences of the United States of America, 28(7), 285-289.
LeBlanc, J. G., Milani, C., de Giori, G. S., Sesma, F., van Sinderen, D., & Ventura, M. (2013). Bacteria as vitamin suppliers to their host: A gut microbiota perspective. Current Opinion in Biotechnology, 24(2), 160-168. 10.1016/j.copbio.2012.08.005
Gougoulias, C., Clark, J. M., & Shaw, L. J. (2014). The role of soil microbes in the global carbon cycle: tracking the below-ground microbial processing of plant-derived carbon for manipulating carbon dynamics in agricultural systems. Journal of the Science of Food and Agriculture, 94(12), 2362-2371. http://doi.org/10.1002/jsfa.6577
Utilize this space to include a bibliography of any literature you want associated with this module. We recommend keeping this as the final header under each module.
An example for Whitman and Wiebe (1998) has been included below.
Whitman WB, Coleman DC, and Wiebe WJ. 1998. Prokaryotes: The unseen majority. Proc Natl Acad Sci USA. 95(12):6578–6583. PMC33863
Kasting JF,and Siefert JL. 2003. Life and the Evolution of Earth’s Atmosphere. Science. 296: 1066-1067. (https://www.ncbi.nlm.nih.gov/pubmed?term=Science%5BJour%5D+AND+Life+and+the+Evolution+of+Earth’s+Atmosphere&TransSchema=title&cmd=detailssearch)
Nisbet EG, and Sleep NH. 2001. The habitat and nature of early life. Nature. 409: 1083-1091.
Orndoroff RC, et al. 2007. Divisions of Geologic Time - Major Chronostratigraphic and Geochronologic Units. Fact Sheet 2007-3015. U.S. Department of the Interior and U.S. Geological Survey.
Falkowski PC, et al. 2008. The Microbial Engines That Drive Earth’s Biogeochemical Cycles. Science 320, 1034 (2008);DOI: 10.1126/science.1153213
Canfield DE, et al. 2010. The Evolution and Future of Earth’s Nitrogen Cycle. Science 330, 192 (2010);DOI: 10.1126/science.1186120
Rockstrom J, et al. 2009. “A safe operating space for humanity” Nature.
Specific emphasis should be placed on the process used to find the answer. Be as comprehensive as possible e.g. provide URLs for web sources, literature citations, etc.
(Reminders for how to format links, etc in RMarkdown are in the RMarkdown Cheat Sheets)
According to E. Stackebrandt, Woese initially identified 12 prokaryotic divisions. By 2003, 53 prokaryotic divisions had been recognized, of which 26 had no cultured representatives. Perhaps some have been cultured by now though.
In 2016, there were around 89 bacerial phyla and 20 archaeal phyla discovered via small 16s rRna databases. But there could be up to 1500 bacterial phyla are there are many microbes that live in the “shadow biosphere”.
This question is difficult to analyze because not all of these metagenome sequencing projects are stored in the same databases. There is GenBank, ebi, and some others. They are sourced from esentially every earthly environment, with some common ones being the human gut, sediments, soil, et cetera. Good candidates for metagenomic studies are complex environments which contain members that are hard to grow in lab cultures.
There are thousands metagenome sequencing projects, and this number is changing all of the time. According to the EBI database, there are 110217 sequencing projects stored there. Note: EBI stands for
Assembly - EULER Binning - S-GCOM Annotation - KEGG Analysis pipelines - MEGAN 5 Databases- IMG/M, MG-RAST, NCBI, Note: there are many levels of curation in different databases.
Standalone Software- OTUbase Analysis pipelines - SILVA Denoising - Amplicon Noise Databases- Ribosomal Database project (RDP)
Phylogenetic vertical gene transfer carry phylogenetic information taxonomic ideally single copy
Functional more horixontal gene trans identify specific biogeochemical functions associated with measured effects
The process of grouping sequences that come from a single genome.
Types of algoriths: 1)align sequences to database 2) group to each other based on DNA characteristics: GC content, codon usage
Risks and opportunities of binning: Risks: incomplete coverage of genome sequence contamination from different phylogeny - some species can have similar DNA characteristics. Also, there is heterogeneity in species (for example e coli)
Functional screens (biochemical, etc) Third gen sequencing (nanopore) - essentially you sequence one whole genome at a time Single cell sequencing (flow cytometry then sequence) FISH probe
Can you express a proteorhodopsin system in e coli and make it respond to light?
Why is this proteorhodopsin pathway so distributed in bacteria in the ocean.
What are the functions of the proteorhodopsin system and its individual components in vivo, and how are they different from the predicted functions?
Functional screening of a large fosmid library generated from a planktonic sample
HPLC analysis was used to identify the pigments in the PR systems
Clones were analyzed for proton pumping activity of the PR system
Two complete genetically distinct PR systems derived from the fosmid library were expressed in E. coli
The function of the gene products of the pathway were elucidated using biochemical methods.
This provided evidence that a single genetic transfer event can introduce a complete PR photosystem, which in turn explains why this particular pathway is distributed in so many bacteria and archaea
Isn’t there an easier way to identify the PR system in a metagenomic sample than functional screening?
How much is the PR system expressed in the bacteria that possess it in the ocean? In the organisms in which it is expressed, to what degree does it contribute to metabolism?
Is there a genetic background more predisposed to the horizontal gene transfer of this system?
The paper mentions that PR expression in marine bacteria may benefit the bacterium in ways not correlated with increased growth rates and yields. What would be this benefit? Would they be more adaptable to new environments?
Why is the PR system more distributed in marine bacteria?
What causes the expression of the PR system in marine bacteria?
Whitman WB, Coleman DC, and Wiebe WJ. 1998. Prokaryotes: The unseen majority. Proc Natl Acad Sci USA. 95(12):6578–6583. PMC33863
Wooley J, et al. 2009. A Primer on Metagenomics. PLOS Computational Biology. Volume 6, Issue 2, e1000667
Madsen EL. 2005. Identifying microorganisms responsible for ecologically significant biogeochemical processes. Nature Reviews: Microbiology. Opinion. Volume 3, May 2005.
Stackebrandt, E. (2012). Molecular identification, systematics, and population structure of prokaryotes. Place of publication not identified: Springer.
https://www.ebi.ac.uk/services
Martinez et al., PNAS 2007.pdf “Bacterial Rhodopsin Gene Expression”
General Questions:
How different are bacteria (even those that technically belong to the same species) genetically? What defines a species? What proteins are shared between these three strains of E. coli? Can we use this information to infer a bit about the evolutionary history of each strain? What about island genes and horizontal transfer? Can we use the different codon usage patterns of island genes to infer whether the island genes were horizontally transfered?
Essentially just cloning, sequencing, and then sequence analysis and annotation. They created whole genome libraries from genomic DNA of the three strains and then sequenced the clones. They used the programs MAGPIE and GLIMMER to annotate the genome and find the ORFs, and then used BLAST to find the predicted protein products. They also analyzed different codon usage patterns to identify island genes gained by horizontal gene tranfer (a long time ago).
The considerable variation in these E. coli lineages or strains to allow them to occupy different ecological niches. This indicates that these extraintestinal E. coli have evolved fairly independently from one another.
Interestingly, there are some “universal insertion targets” in the E. coli genome, that are the site for insertion of DNA from horizontal gene transfer (even though the genes transferred differ between the strains) for all of the strains surveyed. Specifically, it is interesting that these sites are more likely to incorporate foreign DNA than elsewhere in the genome.
How do we define E. coli as a species? Does the fact that these organisms have essentially evolved to occupy different biological niches indicate that they should be characterized as different species?
Some common signature they used in the paper was type III secretion systems, partial prophage genomes, fimbrial adhesins, iron sequestration stems, autotransporters, and phase-switch recombinases. They also used patterns of island appearance vs backbone DNA (identified using different codon usage patterns) too.
An ecotype is a distinct form of a species that occupies a specific niche. For example, in the context of E. coli and the human body, enterohemorrhagic and uropathogenic strains of E. coli are different ecotypes because they inhabit different “environments” or niches in the human body (urinary tract vs gastrointestinal system).
In the diagram, CFT073 appears to possess a variety of pap proteins that the enterohemorrhagic strain EDL933 does not posess. This is likely because pap proteins are used to assemble a pilus necessary for attatchment of E. coli to host cell surfaces. This pilus is necessary in uropathogenic e coli like CFT073 because in order to cause infection, they must be resistant to mechanical force from the urine stream. Enterohemorragic e. coli do not necesarily require this “sticky” phenotype in order to cause infection. Therefore, I would hypothesize that these pap genes were obtained during a horizontal gene transfer event early on in the evolutionary history of the CFT073 strain, and would be common to other strains of uropathogenic e. coli. That, or the pap pilus is a backbone gene that has been deleted from the EDL933 strain genome due to lack of use.
#To make tables
library(kableExtra)
library(knitr)
#To manipulate and plot data
library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 2.2.1 ✔ purrr 0.2.4
## ✔ tibble 1.4.2 ✔ dplyr 0.7.4
## ✔ tidyr 0.8.0 ✔ stringr 1.2.0
## ✔ readr 1.1.1 ✔ forcats 0.2.0
## ── Conflicts ───────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
sample_data1 = data.frame(
number = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14),
name = c("vine", "bricks", "skittles", "mike & ikes", "gummy bears", "M & Ms", "Hershey Kisses", "Sour bear", "Sour fruit", "Sour hexa", "Sour bottle", "Sour swirl", "Jujubes", "wine candy"),
characteristics = c("red vines", "candy lego bricks", "not m and ms", "mike and ikes", "bear shaped", "not skittles", "foil wrapped", "bear shaped and sour", "sour and a fruit", "sour and hexagon shaped", "sour and bottle shaped", "sour and swirly", "honestly not sure what these look like", "wine shaped but not alcoholic"),
occurences = c(14, 18, 187, 174, 101, 241, 16, 3, 2, 6, 3, 3, 24, 9)
)
sample_data1 %>%
kable("html") %>%
kable_styling(bootstrap_options = "striped", font_size = 10, full_width = F)
| number | name | characteristics | occurences |
|---|---|---|---|
| 1 | vine | red vines | 14 |
| 2 | bricks | candy lego bricks | 18 |
| 3 | skittles | not m and ms | 187 |
| 4 | mike & ikes | mike and ikes | 174 |
| 5 | gummy bears | bear shaped | 101 |
| 6 | M & Ms | not skittles | 241 |
| 7 | Hershey Kisses | foil wrapped | 16 |
| 8 | Sour bear | bear shaped and sour | 3 |
| 9 | Sour fruit | sour and a fruit | 2 |
| 10 | Sour hexa | sour and hexagon shaped | 6 |
| 11 | Sour bottle | sour and bottle shaped | 3 |
| 12 | Sour swirl | sour and swirly | 3 |
| 13 | Jujubes | honestly not sure what these look like | 24 |
| 14 | wine candy | wine shaped but not alcoholic | 9 |
No matter how many samples you take, you can never say for certain that you have collected a sample that accurately portrays the diversity of the environment. All you can do is reduce the chance that you have taken an unrepresentative sample, by taking multiple (large) samples. Given the size of the metagenomic data set, I would say that the majority of different species possible were sampled.
sample_data2 = data.frame(
x = c(1,2,3,4,5,6,7,8,9,10),
y = c(1,2,3,4,4,5,5,5,6,6)
)
ggplot(sample_data2, aes(x=x, y=y)) +
geom_point() +
geom_smooth() +
labs(x="Cumulative number of individuals classified", y="Cumulative number of species observed")
## `geom_smooth()` using method = 'loess'
Note: did not have the data for this portion. I just included the code so that I would have it for future reference. However, I would say that if the curve flattens out, you have taken a large enough sample of your environment for it to be fairly representative of said environment.
Diversity: Simpson Reciprocal index calculation Simpson reciprocal index for total community.
species1 = 14/(736)
species2 = 18/(736)
species3 = 187/(736)
species4 = 174/(736)
species5 = 101/(736)
species6 = 241/(736)
species7 = 16/(736)
species8 = 3/(736)
species9 = 2/(736)
species10 = 6/(736)
species11 = 3/(736)
species12 = 3/(736)
species13 = 24/(736)
species14 = 9/(736)
1 / (species1^2 + species2^2 + species3^2 + species4^2 + species5^2 + species6^2 + species7^2 + species8^2 + species9^2 + species10^2 + species11^2 + species12^2 + species13^2 + species14^2)
## [1] 4.011761
Simpson Reciprocal index for a smaller sample of that community: (sample)
species1 = 2/(153)
species2 = 5/(153)
species3 = 37/(153)
species4 = 30/(153)
species5 = 19/(153)
species6 = 64/(153)
species7 = 2/(153)
species8 = 0/(153)
species9 = 1/(153)
species10 = 0/(153)
species11 = 0/(153)
species12 = 0/(153)
species13 = 8/(153)
species14 = 3/(153)
1 / (species1^2 + species2^2 + species3^2 + species4^2 + species5^2 + species6^2 + species7^2 + species8^2 + species9^2 + species10^2 + species11^2 + species12^2 + species13^2 + species14^2)
## [1] 3.425874
Richness: Chao1 richness estimator of entire community
14 + 0^2/(2*14)
## [1] 14
Chao1 Richness estimator of sample: (smaller sample of the community)
14 + 1^2/(2*9)
## [1] 14.05556
library(vegan)
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.4-6
sample_data1_diversity =
sample_data1 %>%
select(name, occurences) %>%
spread(name, occurences)
sample_data1_diversity
## bricks gummy bears Hershey Kisses Jujubes M & Ms mike & ikes skittles
## 1 18 101 16 24 241 174 187
## Sour bear Sour bottle Sour fruit Sour hexa Sour swirl vine wine candy
## 1 3 3 2 6 3 14 9
diversity(sample_data1_diversity, index="invsimpson")
## [1] 4.75165
specpool(sample_data1_diversity)
## Species chao chao.se jack1 jack1.se jack2 boot boot.se n
## All 14 14 0 14 0 14 14 0 1
The measure of diversity depends on the definition of species in your samples because diversity estimates are calculated using the number of species that you find. If you use a different species definition when processing your sample (i.e. 98% genetic similarity vs something like 90%) you will get different numbers for your simpson reciprocal index and chao1 richness estimator. This would change your collectors curve too.
Yes, you could define species as both the type of candy (which they did) AND the color of the candy. So a blue skittle would be considered a different species than a green skittle.
Different sequencing technologies could influence observed diversity in a sample due to inherent bias when it comes to sequencing the data. For example, if your sequencing technology includes amplifying all sequences using universal primers, you will inevitably skew your data to favor some sequences because there is no such thing as a truly universal primer.
Module 3 Writing Assignment: What’s in a [species] name?
Introduction
In Shakespeare’s tragedy Romeo and Juliet, Juliet famously utters the oft-quoted line “What’s in a name? That which we call a rose, [b]y any other name would smell as sweet”(1). Although this text is not often associated with the concept of taxonomy, Juliet makes a point that can be ascribed to the world beyond the insular society of medieval Verona. Indeed, the concept to which Shakespeare poetically refers has long been a topic of investigation for cognitive psychologists the world over. In the field of cognitive psychology, there is a concept termed “Linguistic-relativism” which hypothesizes that the structure of language directly affects the cognition of those that use it(2). To examine this concept in the words of taxonomy, the fact that we assign specific names and taxonomic identifications to genetically distinct organisms directly affects how we consider those organisms to be different from one another. If we accept this theory to be true, this makes the practice of taxonomic classification especially important. Taxonomy was first conceived as a way of classifying multicellular organisms based upon shared characteristics. While this is a relatively easy proposition when one is separating, say, a zebra from a muskrat, it becomes trickier when one ventures into the world of microbes. Should we try to force a multicellular paradigm to fit the prokaryotic biosphere? That is to say, should microbes be classified into species? Although we need some way of separating microbes into groups based upon genetic similarity, it is debatable whether the established taxonomic system is the best way to do so. The taxonomic system as it exists now fails to accurately depict the full extent of microbial diversity, largely due to the phenomenon of horizontal gene transfer.
Microbial species definitions
At a multicellular level, species are often defined as a group of organisms with similar physical characteristics which can breed and produce fertile offspring. This definition must be modified when it is applied to microbes, which typically neither have observable physical characteristics, nor do they “breed” in the traditional sense. The rapid asexual division performed by prokaryotes ensures a pool of genetic material which vastly outstrips that of multicellular organisms in terms of sheer diversity. Thus, in order to apply the taxonomic system at a microscopic level, microbial ecologists needed to create a new set of defining characteristics of species. It comes as no surprise, therefore, that the prokaryotic species definition can vary from scientist to scientist. Wayne et al, one of the earliest attempts to establish a bacterial species definition, defines a bacterial species as “a collection of strains that are characterized by at least one diagnostic phenotypic trait and whose purified DNA molecules show at least 70% [DNA] cross-hybridization”(4,5). Other researchers have defined prokaryotic species as sharing 95% or 97% similar rRNA sequences(5). Further complicating matters are existing environmental sampling methods for microbial populations. The overwhelming majority of microbial “species” cannot be grown in culture. Thus, the only source of information we have about them are fragments of their genomes that can be isolated from samples of their environments(3). This paucity of data is one of the reasons that the two major bioinformatic pipelines for the analysis of 16S rRNA amplicon data do not classify identified prokaryotes in terms of species, but rather using OTUs and ASVs.
Phylogenetic relationships among prokaryotes
One of the benefits of establishing a taxonomic identification system is that it allows us to infer phylogenetic relationships between different species. As mutations occur at a relatively constant rate, it is only logical that organisms with higher degrees of genetic similarity would be more closely related evolutionarily. While this idea is sound in principle, it overlooks the role of horizontal gene transfer among microbes. Among prokaryotes, genetic information can be taken up from the environment, carried into the cell by a phage, or plasmids can even be transferred directly from cell to cell via a conjugative pilus. Through horizontal gene transfer, entire metabolic pathways have been shared between diverse groups of bacteria and archaea, as has been hypothesized to be the case for specific sulfate respiration pathways(6). Indeed, in the early earth, there is molecular evidence of such promiscuous gene flow that communal evolution was likely the primary method of adaptation(6).
This promiscuous sharing of genetic information has obvious implications when it comes to establishing phylogenies. If two bacteria share specific gene or metabolic pathway, they could be directly evolutionarily related, or merely a product of a horizontal gene transfer that occurred earlier in each respective strain’s history. When drawing a phylogenetic tree of a microbial organism, the lines that represent the inheritance of genetic material don’t just run up and down, but horizontally as well. For this reason, the entire phylogenetic species concept, and the taxonomic system that depends upon it must be restructured to account for this property of the prokaryotic biosphere.
Conclusion
In order to accurately describe the genetic relationships between microbes, we have two options. Either we create a new taxonomic identification system specifically tailored to the microbial world, or we make it understood that the microbial species definition is fundamentally different than that of multicellular organisms. No matter how descriptive a given taxonomic system is, in the end, it is entirely arbitrary. No living species is static, or fits neatly into a taxonomic box. Prokaryotes are merely the most obvious of outliers, due to their twin properties of rapid reproduction and horizontal gene transfer. Even when passaging known species on laboratory media rather than their “natural environment”, large-scale genetic differences can be observed on a human timescale. Bordetella Pertussis, for example, has been found to rearrange significant portions of its genome after as few as 12 passages on laboratory media(7). Imagine how different laboratory adapted Escherichia coli must be, after decades of laboratory maintenance. Consider too, the sheer volume of microbial diversity. Microbes were the first living things to evolve, and have benefited from anywhere between 3.5 and 4 billion years of microbial evolution. Multicellular organisms on the other hand, only evolved around 600 million years ago. It is only to be expected that microbial diversity far outstrips multicellular diversity, commensurate to the billions of extra years it has had to evolve. Thus, to create a clear definition of a microbial species would be an exercise in futility.
References:
Gibson, R., & Shakespeare, W. (2006). Shakespeare, Romeo and Juliet. Cambridge: Cambridge Univ. Press.
Gleitman, L., & Papafragou, A. (n.d.). Relations Between Language and ought. In Decision Making(pp. 504-523). doi:https://cpb-us-west-2-juc1ugur1qwqqqo4.stackpathdns.com/web.sas.upenn.edu/dist/4/81/files/2017/07/Gleitman-Papafragou-2013_Relations-between-language-and-thought-19a33dc.pdf
Nichols, D. et al. Use of ichip for high-throughput in situ cultivation of “uncultivable” microbial species. Appl. Environ. Microbiol. 76, 2445–2450 (2010)
Wayne L.G, et al. Report of the ad hoc committee on reconciliation of approaches to bacterial systematics. Int. J. Syst. Bacteriol. 1987;37:463–464.
Falkowski, P., Fenchel, T., & Delong, E. (2008). The Microbial Engines That Drive Earth’s Biogeochemical Cycles. Science Special Reviews,320.
Brinig, M. M., Cummings, C. A., Sanden, G. N., Stefanelli, P., Lawrence, A., & Relman, D. A. (2006). Significant Gene Order and Expression Differences in Bordetella pertussis Despite Limited Gene Content Variation. Journal of Bacteriology,188(7), 2375-2382. doi:10.1128/jb.188.7.2375-2382.2006
Rockstrom J, et al. 2009. “A safe operating space for humanity” Nature.
Welch RA, Burland V, Plunkett G, et al. Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proceedings of the National Academy of Sciences of the United States of America. 2002;99(26):17020-17024. doi:10.1073/pnas.252529799.
• Comment on the creative tension between gene loss, duplication and acquisition as it relates to microbial genome evolution
The gain and loss of genes, and the diversity that arises from this paradigm is mediated by natural selection. If you don’t aquire specialized genes to survive in diverse environments, you will not be able to survive period. This both mediates gain of unique genetic elements, and loss of genetic elements no longer required for survival in said environment: genetic dead weight.